Bayesian Variable Selection for Probit Mixed Models
نویسنده
چکیده
In computational biology, gene expression datasets are characterized by very few individual samples compared to a large number of measurments per sample. Thus, it is appealing to merge these datasets in order to increase the number of observations and diversify the data, allowing a more reliable selection of genes relevant to the biological problem. This necessitates the introduction of the dataset as a random e ect. Extending previous work of Lee et al. (2003), a method is proposed to select relevant variables among tens of thousands in a probit mixed regression model, considered as part of a larger hierarchical Bayesian model. Latent variables are used to identify subsets of selected variables and the collapsing technique of Liu (1994) is combined with a Metropolis-within-Gibbs algorithm (Robert and Casella, 2004). The method is applied to a merged dataset made of three individual gene expression datasets, in which tens of thousands of measurements are available for each of several hundred human breast cancer samples. Even for this large dataset comprised of around 20000 predictors, the method is shown to be e cient and feasible. As a demonstration, it is used to select the most important genes that characterize the estrogen receptor status of the cancer patients.
منابع مشابه
Adaptive Monte Carlo for Bayesian Variable Selection in Regression Models
This article describes a method for efficient posterior simulation for Bayesian variable selection in Generalized Linear Models with many regressors but few observations. A proposal on model space is described which contains a tuneable parameter. An adaptive approach to choosing this tuning parameter is described which allows automatic, efficient computation in these models. The method is appli...
متن کاملA study of variable selection using g-prior distribution with ridge parameter
In the Bayesian stochastic search variable selection framework, a common prior distribution for the regression coefficients is the g-prior of Zellner [1986]. However, there are two standard cases in which the associated covariance matrix does not exist, and the conventional prior of Zellner can not be used: if the number of observations is lower than the number of variables (large p and small n...
متن کاملBayesian Variable Selection for Probit Mixed Models Applied to Gene Selection
In computational biology, gene expression datasets are characterized by very few individual samples compared to a large number of measurements per sample. Thus, it is appealing to merge these datasets in order to increase the number of observations and diversify the data, allowing a more reliable selection of genes relevant to the biological problem. Besides, the increased size of a merged data...
متن کاملA Multivariate Probit Latent Variable Model for Analyzing Dichotomous Responses
We propose a multivariate probit model that is defined by a confirmatory factor analysis model with covariates for analyzing dichotomous data in medical research. Our proposal is a generalization of several useful multivariate probit models, and provides a flexible framework for practical applications. We implement a Monte Carlo EM algorithm for maximum likelihood estimation of the model, and d...
متن کاملBayesian Variable Selections for Probit Models with Componentwise Gibbs Samplers
For variable selection to binary response regression, stochastic search variable selection and Bayesian Lasso have recently been popular. However, these two variable selection methods suffer from heavy computation burden caused by hyperparameter tuning and by matrix inversions, especially when the number of covariates is large. Therefore, this article incorporates the componenetwise Gibbs sampl...
متن کامل